class: title-slide, right, top background-image: url(data:image/png;base64,#img/axsome_logo.png) background-size: 40%, cover
.right-column[ # Module 2: Back to the Basics ### A Quick Stroll Through Base R **Graham Eglit**<br> Axsome Therapeutics<br> Fall 2024 ] --- class: inverse center middle # It's Just a Fancy Calculator ---- <svg viewBox="0 0 581 512" style="position:relative;display:inline-block;top:.1em;fill:white;height:3em;" xmlns="http://www.w3.org/2000/svg"> <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> --- .center[ # Simple Arithmetic ] .pull-left[ - Performing arithmetic is simple and straight-forward in R - R can easily handle addition, subtraction, multiplication and division - `+` - `-` - `*` - `/` - Exponents are also easy to implement using ` ^ ` or `sqrt()` - R can also perform matrix algebra and some calculus, but we won't focus on that ] .pull-right[ ```r 2 + 2 ``` ``` ## [1] 4 ``` ```r 2 * 2 ``` ``` ## [1] 4 ``` ```r 2^2 ``` ``` ## [1] 4 ``` ```r sqrt(2) ``` ``` ## [1] 1.414214 ``` ] --- .center[ # Storing Objects ] .pull-left[ - In order to call up a result in the future, we must store it as an object - Objects are created using the assignment operator: `<-` - `=` can also create stored objects through assignment, but I prefer to reserve `=` for more specific applications - Once an object has been created, it will be stored in the environment and is able to be called up later or further manipulated ] .pull-right[ ```r x <- 2 + 2 ``` ```r x ``` ``` ## [1] 4 ``` <br> .center[<img src="data:image/png;base64,#img/mod2/environment.png" width="65%"/>] ] --- .center[ # Storing More Complex Objects ] .pull-left[ - Objects can store more complex information than just single numeric values - For example, we can store a **vector** of numeric values as an object - As before the key is to use the ` <- ` operator - But now we'll add the `c` function - `c` stands for concatenate and tells R to combine the values into a vector - We'll use `c` and ` <- ` A LOT, so make sure to remember them! ⭐ - In the example to the right, we created a vector consisting of 1, 2, 3, 4, and 5 - We can use the `:` operator as a shortcut, which will inclusively produce all integers between the two specified values ] .pull-right[ ```r y <- c(1, 2, 3, 4, 5) y ``` ``` ## [1] 1 2 3 4 5 ``` ```r a <- 1, 2, 3, 4, 5 ``` ``` ## Error: <text>:1:7: unexpected ',' ## 1: a <- 1, ## ^ ``` ```r z <- 1:5 z ``` ``` ## [1] 1 2 3 4 5 ``` ] --- .center[ # Arithmetic with Vectors ] .pull-left[ - You can also perform arithmetic using vectors - **Element-Wise Execution**: application of an operation to each corresponding element in the set; will return an object (e.g., vector, matrix, etc.) of the same dimensions as the original objects - lines up the vectors and performs a sequence of individual operations ] .pull-right[ ```r y ``` ``` ## [1] 1 2 3 4 5 ``` ```r z ``` ``` ## [1] 1 2 3 4 5 ``` ```r y * z ``` ``` ## [1] 1 4 9 16 25 ``` ] -- .pull-left[ - **Vector Recycling**: If one object is smaller than the other, it will recycle the smaller object until it is as long as the longer vector ] .pull-right[ ```r x * z ``` ``` ## [1] 4 8 12 16 20 ``` ] --- .center[ # Arithmetic Functions ] .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Function </th> <th style="text-align:left;"> Operation </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: white !important;"> mean </td> <td style="text-align:left;background-color: white !important;"> Mean </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> median </td> <td style="text-align:left;background-color: white !important;"> Median </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> sd </td> <td style="text-align:left;background-color: white !important;"> Standard Deviation </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> var </td> <td style="text-align:left;background-color: white !important;"> Variance </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> max </td> <td style="text-align:left;background-color: white !important;"> Largest Element </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> min </td> <td style="text-align:left;background-color: white !important;"> Smallest Element </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> log </td> <td style="text-align:left;background-color: white !important;"> Natural Log </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> exp </td> <td style="text-align:left;background-color: white !important;"> Exponential </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> round </td> <td style="text-align:left;background-color: white !important;"> Round to n Decimal Places </td> </tr> <tr> <td style="text-align:left;background-color: white !important;"> cor </td> <td style="text-align:left;background-color: white !important;"> Correlation </td> </tr> </tbody> </table> ] .pull-right[ ```r mean(y) ``` ``` ## [1] 3 ``` ```r min(y) ``` ``` ## [1] 1 ``` ```r var(y) ``` ``` ## [1] 2.5 ``` ```r round(var(y), digits = 0) ``` ``` ## [1] 2 ``` ] --- .center[ # Anatomy of a Function ] .pull-left[ - **Functions** perform actions on objects - To use a function, write the name of the function and then the object you want the function to operate on in parentheses - e.g., `round(y)` - All the stuff contained within the parentheses are the **arguments** of the function - Typically, the first argument of a function is the object the function should operate on (often referred to as the data) - But functions can have many arguments, like the `digits` argument we saw earlier - For arithmetic functions, missing data arguments are important! ⭐ ] .pull.right[ ```r x_miss = c(1, 2, NA, 4, 5) x_miss ``` ``` ## [1] 1 2 NA 4 5 ``` ```r mean(x_miss) ``` ``` ## [1] NA ``` ```r mean(x_miss, na.rm = TRUE) ``` ``` ## [1] 3 ``` ] --- .center[ # What Have We Covered So Far? ] .pull-left[ - R can perform basic arithmetic <br> <br> - We can create vectors of numbers <br> <br> - We can use element-wise execution to perform arithmetic on multiple vectors <br> <br> - We can use base R functions to perform computations <br> <br> - Functions have arguments that we specify <br> <br> - Be wary of missing data! ] .pull-right[ .center[ <br> <img src="data:image/png;base64,#img/mod2/review.jpg" height="266px" width="399px" /> <figcaption> Photo by Markus Winkler </figcaption> ] ] --- .center[ # Let's Try an Exercise! ] .pull-left[ 1. Create a vector of values 4, 8, 2, 4, 9 and store it as an object "a" <br> <br> 1. Create a second vector of values 7, 4, 5, 9, 2, and store is as object "b" <br> <br> 1. Multiply a and b and store it as object "d" <br> <br> 1. Find the mean of d <br> <br> 1. Find the variance of d <br> <br> 1. Round the variance of d to 0 decimal places <br> <br> ] .pull-right[
02
:
30
] --- .center[ # Solution ] .pull-left[ ```r a <- c(4, 8, 2, 4, 9) a ``` ``` ## [1] 4 8 2 4 9 ``` ```r b <- c(7, 4, 5, 9, 2) b ``` ``` ## [1] 7 4 5 9 2 ``` ```r d <- a * b d ``` ``` ## [1] 28 32 10 36 18 ``` ] .pull-right[ ```r mean(d) ``` ``` ## [1] 24.8 ``` ```r var(d) ``` ``` ## [1] 113.2 ``` ```r round(var(d), digits = 0) ``` ``` ## [1] 113 ``` ] --- class: inverse middle center # A Brief Interruption to Our Regularly Scheduled Programming ---- <svg viewBox="0 0 640 512" style="position:relative;display:inline-block;top:.1em;fill:white;height:3em;" xmlns="http://www.w3.org/2000/svg"> <path d="M592 0H48A48 48 0 0 0 0 48v320a48 48 0 0 0 48 48h240v32H112a16 16 0 0 0-16 16v32a16 16 0 0 0 16 16h416a16 16 0 0 0 16-16v-32a16 16 0 0 0-16-16H352v-32h240a48 48 0 0 0 48-48V48a48 48 0 0 0-48-48zm-16 352H64V64h512z"></path></svg> --- .center[ # Help! ] .left-column[ .center[ <br> <br> <br> <br> <img src = "data:image/png;base64,#https://media.giphy.com/media/fdLR6LGwAiVNhGQNvf/giphy.gif" /> .caption[ Via [Giphy](https://media.giphy.com/media/fdLR6LGwAiVNhGQNvf/giphy.gif) ] ] ] .right-column[ - To learn more about a function, use help documentation - `?function` - `help(function)` - Help documentation consists of the following sections: - *Description*: A short summary of the what the function does - *Usage*: An example of how you would use the function - *Arguments*: A list of each agument the function takes - *Detail*: A more in-depth description of the fucntion and how it operates - *Value*: A description of what the function returns when you run it - *Examples*: Additional examples of the function in use - [CRAN documents](https://CRAN.R-project.org/package=gt) provide package-wide reference manuals, vignettes/tutorials, Github repos, and BugReports - When in doubt, [Stack Overflow](https://stackoverflow.com/) ] --- .center[ # Programming Miscellany ] .pull-left[ 1. R is **case sensitive** - e.g., `mean` does not equal `Mean` <br> <br> 1. Objects/variable names cannot begin with numbers and cannot include some special characters - i.e., ^, !, $, @, +, -, /, or * <br> <br> 1. Use `#` to comment code - Will not evaluate anything written after `#` <br> <br> 1. Spaces are not allowed - Use `_` instead ] .pull-right[ ```r Mean(c(1, 2, 3)) ``` ``` ## Error in Mean(c(1, 2, 3)): could not find function "Mean" ``` <br> <br> <br> ```r # mean(c(1, 2, 3)) ``` <br> ```r x miss ``` ``` ## Error: <text>:1:3: unexpected symbol ## 1: x miss ## ^ ``` ] --- .center[ # Debugging ] .panelset[ .panel[.panel-name[Debugging Messages] - <span style = 'color: red; font-weight: bold;'>Errors</span>: Text will begin with "*Error in...*" and include a description of the error - Your code will not work - These messages cannot be ignored - <span style = 'color: yellow; font-weight: bold;' >Warnings</span>: Text will begin with "*Warning:*" and followed by an explanation of the cause of the warning - Your code will probably still work, but there may be issues - Evaluate the warning and determine whether changes are necessary - <span style = 'color: green; font-weight: bold;' >Messages</span>: Text will not begin with "*Error in...*" or "*Warning:*" - These are helpful diagnostics and will not stop your code from running ] <!----> .panel[.panel-name[Error Example] ```r f <- c(1, 2, 3, 4, 5) Mean(f) ``` ``` ## Error in Mean(f): could not find function "Mean" ``` ```r ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point() ``` ``` ## Error in ggplot(mtcars, aes(x = disp, y = mpg)): could not find function "ggplot" ``` ] <!----> .panel[.panel-name[Warning Example] .pull-left[ ```r ggplot(airquality, aes(x = Ozone, y = Temp)) + geom_point() ``` ``` ## Warning: Removed 37 rows containing missing values (`geom_point()`). ``` ] .pull-right[ <!-- --> ] ] <!----> .panel[.panel-name[Message Example] .pull-left[ ```r require(ggplot2) ``` ``` ## Loading required package: ggplot2 ``` ```r ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point() ``` ] .pull-right[ <!-- --> ] ] <!----> ] <!--end of panelset--> --- class: center middle hide_logo background-image: url(data:image/png;base64,#img/mod2/quiz.jpg) background-size: cover <span style = 'color: red; font-weight: bold; font-size: 120px'>Pop Quiz!</span> --- .center[ # Calculating Correlations ] .pull-left[ 1. Create a vector of values 42, 75, 38, NA, 10, 120, 32 and name it "a" 1. Create a second vector of values 504, NA, 690, 200, 250, NA, 400 and name it "b" 1. Calculate the Spearman rank correlation of vector "a" and vector "b" - Hint: Use the `cor` function
03
:
00
] -- .pull-right[ ```r a <- c(42, 75, 38, NA, 10, 120, 32) b <- c(504, NA, 690, 200, 250, NA, 400) cor(a, b, use = "complete.obs", method = "spearman") ``` ``` ## [1] 0.8 ``` ] --- class: inverse center middle # R Objects: More Than Just (Atomic) Vectors ---- <svg viewBox="0 0 581 512" style="position:relative;display:inline-block;top:.1em;fill:white;height:3em;" xmlns="http://www.w3.org/2000/svg"> <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> --- .center[ # Atomic Vectors ] .left-column[ .center[ <br> <br> <br> <img src = "data:image/png;base64,#https://media.giphy.com/media/tAPkUnA2GQNVK/giphy.gif" /> .caption[ Via [Giphy](https://media.giphy.com/media/tAPkUnA2GQNVK/giphy.gif) ] ] ] .right-column[ - **Atomic Vector** - a simple vector of data - One-dimensional - All elements must be of the same type - Basis of most data structures in R - R recognizes six types of atomic vectors 1. **Doubles** - stores regular numbers, can include decimals 1. **Integers** - stores whole numbers 1. **Characters** - stores small pieces of text ("strings") 1. **Logicals** - stores TRUEs and FALSEs 1. **Complex** - stores complex numbers (e..g, imaginary numbers) 1. **Raw** - stores raw bytes of data - You can check the type of a vector using the `typeof` or `is.*` functions ] --- .center[ # Double Atomic Vectors ] .pull-left[ - Stores regular numbers <br> <br> - Can have digits to the right of the decimal place or not <br> <br> - Default type for numbers in R <br> <br> - Sometimes referred to as a "numeric" in R <br> <br> - Numbers are stored to 16 significant digits ] .pull-left[ ```r x_type <- c(1, 2, 3, 4, 5) typeof(x_type) ``` ``` ## [1] "double" ``` ```r is.numeric(x_type) ``` ``` ## [1] TRUE ``` ```r sqrt(2)^2 - 2 ``` ``` ## [1] 4.440892e-16 ``` ] --- .center[ # Integer Atomic Vectors ] .pull-left[ - Stores whole numbers <br> <br> - Cannot have digits to the right of the decimal place <br> <br> - Needs to be specified in R using an `L` after a number <br> <br> - More precise than doubles <br> <br> ] .pull-left[ ```r x_type <- c(1L, 2L, 3L, 4L, 5L) typeof(x_type) ``` ``` ## [1] "integer" ``` ```r is.integer(x_type) ``` ``` ## [1] TRUE ``` ] --- .center[ # Character Atomic Vectors ] .pull-left[ - Stores text strings <br> <br> - Need to surround text with `""` in order to combine it into a vector <br> <br> - Text can be letters, special symbols, or numbers <br> <br> - Anything surrounded by quotes will be treated by R as a character string <br> <br> ] .pull-left[ ```r x_type <- c("You", "Are", "Number", "1", "!") x_type ``` ``` ## [1] "You" "Are" "Number" "1" "!" ``` ```r typeof(x_type) ``` ``` ## [1] "character" ``` ```r is.character(x_type) ``` ``` ## [1] TRUE ``` ] --- .center[ # Logical Atomic Vectors ] .pull-left[ - Stores TRUE and FALSE values <br> <br> - Important for logical and Boolean operators - e.g., a > b & a > c <br> <br> - R will interpret TRUE, FALSE, T, or F as logical data - Note the absence of quotations! ] .pull-left[ ```r x_type <- c(TRUE, T, FALSE, F) x_type ``` ``` ## [1] TRUE TRUE FALSE FALSE ``` ```r typeof(x_type) ``` ``` ## [1] "logical" ``` ```r is.logical(x_type) ``` ``` ## [1] TRUE ``` ] --- .center[ # 💪 Coercion 💪 ] .panelset[ .panel[.panel-name[Coercion Background] .pull-left[ - **Coercion** - Conversion of data type when conflicting types are present - e.g., double and character string - R follows a consistent set of rules for coercion 1. character and double/integer/logical --> character 1. double and integer --> double 1. double and logical --> double 1. integer and logical --> integer - TRUE = 1, FALSE = 0 - Coercion rules retain information ] .pull-right[ <img src = "data:image/png;base64,#img/mod2/coercion.png" /> .caption[ Figure from [Hands-On Programming with R](https://rstudio-education.github.io/hopr/) ] ] ] <!----> .panel[.panel-name[Coercion In Action] .pull-left[ ```r typeof(c("Hello", 1, 4, 8)) ``` ``` ## [1] "character" ``` ```r c("Hello", 1, 4, 8) ``` ``` ## [1] "Hello" "1" "4" "8" ``` ] .pull-right[ ```r typeof(c(TRUE, 1, 5, FALSE)) ``` ``` ## [1] "double" ``` ```r c(TRUE, 1, 5, FALSE) ``` ``` ## [1] 1 1 5 0 ``` ```r sum(c(TRUE, FALSE, FALSE, TRUE)) ``` ``` ## [1] 2 ``` ] ] <!----> ] <!--end of panelset--> --- .center[ # Beyond Atomic Vectors: Matrices! ] .pull-left[ - Matrices store values in a two-dimensional array - like in linear algebra - Can only store a single type of information (e.g., doubles) - Use the `matrix` function to create a matrix <br> .center[ <img src = "data:image/png;base64,#img/mod2/matrix.png" /> .caption[ `matrix` Help Documentation] ] ] .pull-right[ ```r m <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE) m ``` ``` ## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ``` ] --- .center[ # Beyond Atomic Vectors: Factors! ] .pull-left[ - Factors store categorical information - e.g., ethnicity, sex, etc. - R stores factors as integers - Makes them good for using in models - But confusing... - Looks like a character string - But behaves like an integer - To create a factor, use the `factor` function - Factors also have a levels attribute - This can be specificied using the `levels` argument in the `factor` function ] .pull-right[ ```r f <- factor(c("ACT", "PBO", "PBO", "ACT")) f ``` ``` ## [1] ACT PBO PBO ACT ## Levels: ACT PBO ``` ```r f <- factor(c("ACT", "PBO", "PBO", "ACT"), levels = c("ACT", "PBO")) f ``` ``` ## [1] ACT PBO PBO ACT ## Levels: ACT PBO ``` ] --- .center[ # Beyond Atomic Vectors: Lists! ] .pull-left[ - Lists group together R objects, such as atomic vectors - Lists can be comprised of many different data types - e.g., double vectors, character vectors, other lists - Use the `list` function to create a list - Lists are very important in R but can get complicated quickly - We won't use lists much in this course, except... ] .pull-right[ ```r list(1:5, c("Here", "I", "Am"), c(T, T, F)) ``` ``` ## [[1]] ## [1] 1 2 3 4 5 ## ## [[2]] ## [1] "Here" "I" "Am" ## ## [[3]] ## [1] TRUE TRUE FALSE ``` ] --- .center[ # Beyond Atomic Vectors: Dataframes! ] .panelset[ .panel[.panel-name[Data Frames Background] .pull-left[ - Data frames are the two-dimensional version of a list - Vectors are columns in a data frame - Because data frames are lists, the columns (vectors) can be of different types - However, the cells (elements) within a column (vector) must all be of the same type - Data frames are the most common type of data object in R and we'll use them A LOT over the coming weeks ] .pull-right[ .center[Air Quality Data Frame] ``` ## Ozone Solar.R Wind Temp Month Day ## 1 41 190 7.4 67 5 1 ## 2 36 118 8.0 72 5 2 ## 3 12 149 12.6 74 5 3 ## 4 18 313 11.5 62 5 4 ## 5 NA NA 14.3 56 5 5 ## 6 28 NA 14.9 66 5 6 ``` ] ] <!----> .panel[.panel-name[Data Frames In Action] .pull-left[ - Use `data.frame` to create your own data frame - Note that data frames are lists ```r dat <- data.frame( x = c(1, 2, 3), y = c(TRUE, FALSE, FALSE), z = c("Bob", "Phil", "Jerry") ) typeof(dat) ``` ``` ## [1] "list" ``` ] .pull-right[ - Data frames can contain different types of data in columns - To see the type of data of each column use the `str` (short for structure) function ```r str(dat) ``` ``` ## 'data.frame': 3 obs. of 3 variables: ## $ x: num 1 2 3 ## $ y: logi TRUE FALSE FALSE ## $ z: chr "Bob" "Phil" "Jerry" ``` ] ] <!----> ] <!--end of panelset--> --- .center[ # One More Time, Please! ] .pull-left[ - Data can be stored in **atomic vectors** <br> <br> - There are four main types of atomic vectors: **double, integer, character, and logical** <br> <br> - Conflicting data types are **coerced** following a consistent set of rules <br> <br> - **Factors** are well-suited for handling categorical data <br> <br> - **Lists** allow for different data types to co-exist <br> <br> - **Data Frames** are two-dimensional lists that are used frequently in data analysis ] .pull-right[ .center[ <br> <br> <img src="data:image/png;base64,#img/mod2/review.jpg" height="266px" width="399px" /> <figcaption> Photo by Markus Winkler </figcaption> ] ] --- class: bottom right hide_logo background-image: url(data:image/png;base64,#img/mod2/vw.jpg) background-size: cover .left[.footnote[Photograph by [Tommy Lisbin](https://unsplash.com/photos/xr-y6Ruw7K8)]] # Off to the Tidyverse...